Development of a Real-time Asr System for Slovak Speechdat Database
نویسندگان
چکیده
This paper describes development of a real-time speech recognition system in Slovak for the voice-operated telephone services. The system is based on SPHINX2 platform. The decoder using Hidden Markov Models was trained on the SpeechDat-E Slovak database. It is speaker independent, large vocabulary, continuous speech real-time automatic speech recognition system. Test results are given for the test groups of isolated digits, connected isolated digits, application words, phonetically rich words, and city (plus proper) names. Achieved word error rates are in the interval from 3.73% (connected isolated digits, vocabulary of 11 words) to 15.72% (city names, vocabulary of 927 words).
منابع مشابه
Recording of Czech and Slovak Telephone Databases within SpeechDat-E
The databases of 5 East-European languages: Czech, Slovak, Russian, Polish and Hungarian are being created within the SpeechDat-E project. This paper describes the overall design of SpeechDat-E databases and concentrates on the Czech (1000 speakers) and Slovak (1000 speakers). The item structure and recording speci cations are presented. More detailed description is included for the language-sp...
متن کاملBasque Speecon-like and Basque SpeechDat MDB-600: speech databases for the development of ASR technology for Basque
This paper introduces two databases specifically designed for the development of ASR technology for the Basque language: the Basque Speecon-like database and the Basque SpeechDat MDB-600 database. The former was recorded in an office environment according to the Speecon specifications, whereas the later was recorded through mobile telephones according to the SpeechDat specifications. Both datab...
متن کاملCrosslingual and bilingual speech recognition with Slovak and Czech speechdat-e databases
This paper presents the work on crosslingual and bilingual speech recognition carried out with SpeechDat databases for Czech and Slovak language. The work follows the MASPER initiative that was formed as a part of the COST 278 Action. In crosslingual experiments the expert-driven and the datadriven approaches were used for transferring monolingual source acoustic models to a target language. Th...
متن کاملSpeechDat(E) - Eastern European Telephone Speech Databases
This paper describes the creation of five new telephony speech databases for Central and Eastern European languages within the SpeechDat(E) project. The 5 languages concerned are Czech, Polish, Slovak, Hungarian, and Russian. The databases follow SpeechDat-II specifications with some language specific adaptation. The present paper describes the differences between SpeechDat(E) and earlier Speec...
متن کاملSpeechdat-e: five eastern european speech databases for voice-operated teleservices completed
In the Speechdat-E project five medium large telephone speech databases have been collected for Czech, Hungarian, Polish, Russian, and Slovak. The project was recently concluded. This paper reports briefly on the contents of the databases, elaborates on experiences gained from the data recordings and from the validation of the databases. The availability of the databases to the public is addres...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005